Installing and importing libraries¶

In [1]:
!pip install geopy
Requirement already satisfied: geopy in c:\users\brahi\anaconda3\lib\site-packages (2.4.1)
Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\brahi\anaconda3\lib\site-packages (from geopy) (2.0)
In [2]:
import pandas as pd
import folium 
from folium import plugins
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "notebook"

Data exploration¶

In [4]:
data= pd.read_csv('hop.csv')
In [5]:
data
Out[5]:
Liste nominative des établissements hospitaliers par catégorie 2022 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 Région Delegation Commune Etablissement hospitalier Catégorie NaN Liste des abréviations NaN
2 Tanger-Tetouan-Al Hoceima Al Hoceima Al Hoceima (Mun.) Mohamed V HP NaN HP Hôpital Provincial/Préfectoral
3 Tanger-Tetouan-Al Hoceima Al Hoceima Al Hoceima (Mun.) C. d'oncologie d'Al Hoceima CRO NaN HR Hôpital Régional
4 Tanger-Tetouan-Al Hoceima Al Hoceima Imzouren (Mun.) Imzouren HPr NaN HIR Hospital Interrégional
... ... ... ... ... ... ... ... ...
167 Laayoune-Sakia El Hamra Es Semara Es-semara (Mun.) Es-Smara HP NaN NaN NaN
168 Laayoune-Sakia El Hamra Laayoune Laayoune (Mun.) My Hassan Ben El Mehdi HR NaN NaN NaN
169 Laayoune-Sakia El Hamra Laayoune Laayoune (Mun.) Hassan II HR NaN NaN NaN
170 Laayoune-Sakia El Hamra Laayoune Laayoune (Mun.) Laayoune CRO NaN NaN NaN
171 Eddakhla-Oued Eddahab Oued Ed-Dahab Dakhla (Mun.) Hassan II HR NaN NaN NaN

172 rows × 8 columns

In [7]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172 entries, 0 to 171
Data columns (total 8 columns):
 #   Column                                                               Non-Null Count  Dtype  
---  ------                                                               --------------  -----  
 0   Liste nominative des établissements hospitaliers par catégorie 2022  171 non-null    object 
 1   Unnamed: 1                                                           171 non-null    object 
 2   Unnamed: 2                                                           171 non-null    object 
 3   Unnamed: 3                                                           171 non-null    object 
 4   Unnamed: 4                                                           171 non-null    object 
 5   Unnamed: 5                                                           0 non-null      float64
 6   Unnamed: 6                                                           9 non-null      object 
 7   Unnamed: 7                                                           8 non-null      object 
dtypes: float64(1), object(7)
memory usage: 10.9+ KB

observation : We have 172 row and 8 columns with Unnamed headers¶

In [9]:
data.isna().sum()
Out[9]:
Liste nominative des établissements hospitaliers par catégorie 2022      1
Unnamed: 1                                                               1
Unnamed: 2                                                               1
Unnamed: 3                                                               1
Unnamed: 4                                                               1
Unnamed: 5                                                             172
Unnamed: 6                                                             163
Unnamed: 7                                                             164
dtype: int64

observation : out of the 8 columns, one is completely null, and the first row is completely null as well¶

Data cleaning and feature engineering¶

#### I droped some columns due to these reasons : * Unnamed 5 : because it's entirely empty * Unnamed 6 : it doesn't add any value * Unnamed 7 : it plays an important role, but it needs to be mapped properly with the categories in (Unnamed: 4) so i leave it for latter #### Mapping process : I created a dict that contains categories abbreviations and their meaning, than I used map() function
In [13]:
data = data.drop(['Unnamed: 5','Unnamed: 6','Unnamed: 7'],axis=1)                                                            
In [14]:
columns_names=list(data.iloc[1])
data.columns = columns_names
hopital =  data.drop([0,1])
In [15]:
categories_mapping = {'HP':'Hôpital Provincial/Préfectoral', 'CRO' : "Centre Régional d'Oncologie", 'HPr':'Hôpital de Proximité', 
                      'HPsyR':'Hôpital Psychiatrique Régional', 'HIR':'Hospital Interrégional', 'CPU':'Centre Psychiatrique Universitaire'
                      ,'HPsyP':'Hôpital Psychiatrique Provincial/préfectoral', 'HR':'Hôpital Régional'}
hopital['catégorie signification'] = hopital['Catégorie'].map(categories_mapping)
In [16]:
hopital.isna().sum() # recheck for null values
Out[16]:
Région                       0
Delegation                   0
Commune                      0
Etablissement hospitalier    0
Catégorie                    0
catégorie signification      0
dtype: int64
#### I created a new column called 'Type Administratif' which show whether the commune is 'Municipalité' or 'arrondissement' Values in the Commune column are in this format : Al Hoceima (Arrond.)/ Al Hoceima (Mun.) where (Arrond.) stands for arrondissement and (Mun.) stands for Municipalité so i need to extract these two categories and store them in a new column.
In [18]:
#creating Type Administratif column (urban/rural)
hopital['Type Administratif'] = hopital['Commune'].str.extract(r'\((.*?)\)')[0]
hopital['Type Administratif'] = hopital['Type Administratif'].replace(['Mun.', 'Arrond.'], ['Municipalité', 'Arrondissement'])
In [19]:
# removing the administartiv type from 'commune' column
hopital['Commune'] = hopital['Commune'].str.replace(r' \(Mun\.\)', '', regex=True)
hopital['Commune'] = hopital['Commune'].str.replace(r' \(Arrond\.\)', '', regex=True)
In [20]:
hopital
Out[20]:
Région Delegation Commune Etablissement hospitalier Catégorie catégorie signification Type Administratif
2 Tanger-Tetouan-Al Hoceima Al Hoceima Al Hoceima Mohamed V HP Hôpital Provincial/Préfectoral Municipalité
3 Tanger-Tetouan-Al Hoceima Al Hoceima Al Hoceima C. d'oncologie d'Al Hoceima CRO Centre Régional d'Oncologie Municipalité
4 Tanger-Tetouan-Al Hoceima Al Hoceima Imzouren Imzouren HPr Hôpital de Proximité Municipalité
5 Tanger-Tetouan-Al Hoceima Al Hoceima Targuist Targuist HPr Hôpital de Proximité Municipalité
6 Tanger-Tetouan-Al Hoceima Chefchaouen Chefchaouen Mohamed V HP Hôpital Provincial/Préfectoral Municipalité
... ... ... ... ... ... ... ...
167 Laayoune-Sakia El Hamra Es Semara Es-semara Es-Smara HP Hôpital Provincial/Préfectoral Municipalité
168 Laayoune-Sakia El Hamra Laayoune Laayoune My Hassan Ben El Mehdi HR Hôpital Régional Municipalité
169 Laayoune-Sakia El Hamra Laayoune Laayoune Hassan II HR Hôpital Régional Municipalité
170 Laayoune-Sakia El Hamra Laayoune Laayoune Laayoune CRO Centre Régional d'Oncologie Municipalité
171 Eddakhla-Oued Eddahab Oued Ed-Dahab Dakhla Hassan II HR Hôpital Régional Municipalité

170 rows × 7 columns

In [ ]:
 
In [22]:
hopital_region=pd.DataFrame(hopital.groupby('Région')['Etablissement hospitalier'].count())
hopital_region 
Out[22]:
Etablissement hospitalier
Région
Béni Mellal-Khénifra 12
Casablanca-Settat 27
Drâa-Tafilalet 11
Eddakhla-Oued Eddahab 1
Fès-Meknès 21
Guelmim-Oued Noun 5
Laayoune-Sakia El Hamra 5
Marrakech-Safi 20
Oriental 16
Rabat-Salé-Kénitra 20
Souss-Massa 9
Tanger-Tetouan-Al Hoceima 23

Visualisation¶

#### I created a table with the number of hospitals, number of population, death and births in 2022 grouped by region The purpose of this table :
  • Compare the distribution of hospitals in the 12 regions.
  • Compare the effective population across the 12 regions.
  • Investigating the correlation between effective population size and the number of hospitals.
  • Investigating the correlation between effective population size and the number of deaths.
  • Investigating the correlation between effective population size and the number of births.

Note: The population column contains estimated values derived from multiple sources and may not accurately represent the population in 2022.¶

##### Calculations for Deaths and Births
  • Estimated Deaths (2022):
    We used a crude death rate of 6.6 deaths per 1,000 people for Morocco in 2022.
    For each region, we calculated:
    Total Deaths = (Population * 6.6) / 1,000
    Example: For Tanger-Tétouan-Al Hoceïma (3,651,427 people), deaths = (3,651,427 * 6.6) / 1,000 = 24,099.

  • Estimated Births (2022):
    We used a crude birth rate of 17.16 births per 1,000 people for Morocco in 2022.
    For each region, we calculated:
    Total Births = (Population * 17.16) / 1,000
    Example: For Tanger-Tétouan-Al Hoceïma (3,651,427 people), births = (3,651,427 * 17.16) / 1,000 = 62,677.

In [39]:
import pandas as pd

# Your population dictionary
population_2022 = {
    "Tanger-Tétouan-Al Hoceïma": 3651427,
    "Lâayoune-Sakia El Hamra": 370000,
    "Oued Ed-Dahkla-Oued Eddahab": 130000,
    "Casablanca-Settat": 6950000,
    "Rabat-Salé-Kénitra": 4850000,
    "Fès-Meknès": 4150000,
    "Marrakech-Safi": 4550000,
    "Drâa-Tafilalet": 1650000,
    "Souss-Massa": 2900000,
    "Guelmim-Oued Noun": 430000,
    "Béni Mellal-Khénifra": 3550000,
    "Oriental": 2450000
}


hopital_region.index = hopital_region.index.str.replace('Tanger-Tetouan-Al Hoceima', 'Tanger-Tétouan-Al Hoceïma')
hopital_region.index = hopital_region.index.str.replace('Laayoune-Sakia El Hamra', 'Lâayoune-Sakia El Hamra')
hopital_region.index = hopital_region.index.str.replace('Eddakhla-Oued Eddahab', 'Oued Ed-Dahkla-Oued Eddahab')

hopital_region['Population (approximativement)'] = hopital_region.index.map(population_2022)

hopital_region['Estimated Deaths (2022)'] = (hopital_region['Population (approximativement)'] * 6.6) / 1000

hopital_region['Estimated Births (2022)'] = (hopital_region['Population (approximativement)'] * 17.16) / 1000

display(hopital_region)
Etablissement hospitalier Population (approximativement) Estimated Deaths (2022) Estimated Births (2022)
Région
Béni Mellal-Khénifra 12 3550000 23430.0000 60918.00000
Casablanca-Settat 27 6950000 45870.0000 119262.00000
Drâa-Tafilalet 11 1650000 10890.0000 28314.00000
Oued Ed-Dahkla-Oued Eddahab 1 130000 858.0000 2230.80000
Fès-Meknès 21 4150000 27390.0000 71214.00000
Guelmim-Oued Noun 5 430000 2838.0000 7378.80000
Lâayoune-Sakia El Hamra 5 370000 2442.0000 6349.20000
Marrakech-Safi 20 4550000 30030.0000 78078.00000
Oriental 16 2450000 16170.0000 42042.00000
Rabat-Salé-Kénitra 20 4850000 32010.0000 83226.00000
Souss-Massa 9 2900000 19140.0000 49764.00000
Tanger-Tétouan-Al Hoceïma 23 3651427 24099.4182 62658.48732
In [41]:
fig = go.Figure(go.Bar(

    x=hopital_region.index,

    y=hopital_region['Etablissement hospitalier'],

    text=hopital_region['Etablissement hospitalier'],

    textposition='auto',  # Automatically position text on bars

    marker_color=['#4B0081', '#6A0DAD', '#8A2BE2', '#9400D3']  # Darker color palette

))
fig.update_layout(yaxis_title = "Nombre d'Hopitaux", xaxis_title='Région', title="Etablissement hospitalier par Région (Approximativement)")
In [43]:
fig_pop = go.Figure(go.Bar(

    x=hopital_region.index,

    y=hopital_region['Population (approximativement)'],

    text=hopital_region['Population (approximativement)'],

    textposition='auto',  # Automatically position text on bars,

    marker_color=['#4B0081', '#6A0DAD', '#8A2BE2', '#9400D3']  # Darker color palette

))
fig_pop.update_layout(yaxis_title = "Population", xaxis_title='Région', title="Population par Région (Approximativement)")
In [45]:
deaths_box  = px.box(hopital_region, x ='Estimated Deaths (2022)', title='Distribution des Décès| Estimées (2022)')
deaths_box
In [46]:
births_box  = px.box(hopital_region, x ='Estimated Births (2022)', title= 'Distribution des Naissances Estimées (2022)')
births_box
In [48]:
fig_region_births_deaths = px.bar(hopital_region, x=hopital_region.index, y =list(hopital_region[['Estimated Births (2022)','Estimated Deaths (2022)']]),
                                 title="Naissances et Décès Estimés par Région (2022)")
fig_region_births_deaths
In [51]:
correlation_pop_hop= px.scatter(hopital_region, y='Etablissement hospitalier', x='Population (approximativement)', trendline='ols',  # Ordinary Least Squares regression
                                title="Correlation entre le nombre d'Établissements Hospitaliers et Population")
correlation_pop_hop
In [52]:
correlation_pop_death= px.scatter(hopital_region, y='Estimated Deaths (2022)', x='Population (approximativement)', trendline='ols',  # Ordinary Least Squares regression
                                title="Correlation entre les Décès hospitalier et la Population")
correlation_pop_death
In [53]:
correlation_pop_births= px.scatter(hopital_region, y='Estimated Births (2022)', x='Population (approximativement)', trendline='ols',  # Ordinary Least Squares regression
                                title="Correlation entre les Naissances et la Population")
correlation_pop_births
In [54]:
correlation_hop_births= px.scatter(hopital_region, y='Estimated Births (2022)', x='Etablissement hospitalier', trendline='ols',  # Ordinary Least Squares regression
                                title="Corrélation entre le Nombre d'Établissements Hospitaliers et les Naissances")
correlation_hop_births
In [58]:
correlation_hop_deaths= px.scatter(hopital_region, y='Estimated Deaths (2022)', x='Etablissement hospitalier', trendline='ols',  # Ordinary Least Squares regression
                                title="Corrélation entre le Nombre d'Établissements Hospitaliers et le Nombre de Décès")
correlation_hop_deaths
In [61]:
correlation_matrix = hopital_region.corr()

plt.figure(figsize=(8, 6))

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')

plt.title('Correlation Matrix')

plt.show()
No description has been provided for this image
#### I created a table grouped by both 'Type Administratif' and 'catégorie signification' columns with the count the as the third column. #### Count : shows the count of categories in each Type of Administration #### Purpose of this table : * Compare the distribution of categories across each type of administration (Arrondissement and Municipalité) using a stacked bar chart.
In [64]:
#Group the data by 'Type Administratif' and 'Catégorie' and count the occurrences
rurale_urbaine = hopital.groupby(['Type Administratif', 'catégorie signification']).size().reset_index(name='Count')
rurale_urbaine
Out[64]:
Type Administratif catégorie signification Count
0 Arrondissement Centre Psychiatrique Universitaire 3
1 Arrondissement Hospital Interrégional 20
2 Arrondissement Hôpital Provincial/Préfectoral 9
3 Arrondissement Hôpital Psychiatrique Régional 1
4 Arrondissement Hôpital Régional 10
5 Arrondissement Hôpital de Proximité 4
6 Municipalité Centre Psychiatrique Universitaire 2
7 Municipalité Centre Régional d'Oncologie 4
8 Municipalité Hospital Interrégional 4
9 Municipalité Hôpital Provincial/Préfectoral 63
10 Municipalité Hôpital Psychiatrique Provincial/préfectoral 3
11 Municipalité Hôpital Régional 8
12 Municipalité Hôpital de Proximité 33
In [66]:
fig_cat_adm = px.bar(rurale_urbaine, x='Type Administratif', y ='Count',color='catégorie signification' )
fig_cat_adm

fig_cat_adm
#### This map helps you find hospitals across Morocco, showing their region and if they’re in a rural or urban areas.
  • Markers: Colors show the region (like red for Marrakech-Safi).
  • Explore: Zoom in or out, and click a marker for more hospital info.
  • Legend: Look at the bottom-left to decode colors and icons.
In [648]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import folium
import pandas as pd
import numpy as np
import random
import time

# Assume 'hopital' is your DataFrame; replace with your actual data loading if needed
# Example: hopital = pd.read_csv("hospitals.csv")

# Step 1: Set up the geocoder with rate limiting and retries
geolocator = Nominatim(user_agent="hospital_map", timeout=10)
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1, max_retries=3, error_wait_seconds=2)

# Step 2: Define a color palette for regions
region_colors = {
    "Tanger-Tétouan-Al Hoceïma": "blue",
    "Lâayoune-Sakia El Hamra": "red",
    "Oued Ed-Dahkla-Oued Eddahab": "green",
    "Casablanca-Settat": "purple",
    "Rabat-Salé-Kénitra": "orange",
    "Fès-Meknès": "pink",
    "Marrakech-Safi": "darkgreen",
    "Drâa-Tafilalet": "darkblue",
    "Souss-Massa": "darkred",
    "Guelmim-Oued Noun": "lightblue",
    "Béni Mellal-Khénifra": "lightgreen",
    "Oriental": "gray"
}

# Step 3: Geocode communes with a fallback to region coordinates
commune_coords = {}
country = "Morocco"
failed_communes = []

for commune in hopital["Commune"].unique():
    clean_commune = commune.replace(" (Arrond.)", "").replace(" (Mun.)", "")
    try:
        location = geocode(f"{clean_commune}, {country}")
        if location:
            commune_coords[commune] = (location.latitude, location.longitude)
        else:
            print(f"Couldn’t find coordinates for {clean_commune}, {country}")
            failed_communes.append(commune)
    except Exception as e:
        print(f"Error with {clean_commune}: {e}")
        failed_communes.append(commune)

# Fallback: Use region coordinates for failed communes
for commune in failed_communes:
    region = hopital[hopital["Commune"] == commune]["Région"].iloc[0]
    try:
        location = geocode(f"{region}, {country}")
        if location:
            # Add a larger offset to approximate commune location within region
            offset_lat = random.uniform(-0.1, 0.1)  # ~10 km
            offset_lon = random.uniform(-0.1, 0.1)
            commune_coords[commune] = (location.latitude + offset_lat, location.longitude + offset_lon)
            print(f"Using region fallback for {commune}: {commune_coords[commune]}")
        else:
            print(f"Couldn’t geocode region {region} for {commune}")
    except Exception as e:
        print(f"Error geocoding region {region} for {commune}: {e}")

# Step 4: Assign coordinates to each hospital with unique keys
hospital_coords = {}
hospital_groups = hopital.groupby("Commune")

for commune, group in hospital_groups:
    if commune in commune_coords:
        base_coord = commune_coords[commune]
        num_hospitals = len(group)
        if num_hospitals == 1:
            unique_key = f"{group.iloc[0]['Etablissement hospitalier']}_{commune}"
            hospital_coords[unique_key] = base_coord
        else:
            for _, row in group.iterrows():
                offset_lat = random.uniform(-0.001, 0.001)  # ~100 meters
                offset_lon = random.uniform(-0.001, 0.001)
                unique_key = f"{row['Etablissement hospitalier']}_{commune}"
                hospital_coords[unique_key] = (
                    base_coord[0] + offset_lat,
                    base_coord[1] + offset_lon
                )
    else:
        print(f"Commune {commune} still not geocoded after fallback.")

# Step 5: Create the map
if hospital_coords:
    latitudes = [coord[0] for coord in hospital_coords.values()]
    longitudes = [coord[1] for coord in hospital_coords.values()]
    mean_lat = np.mean(latitudes)
    mean_lon = np.mean(longitudes)

    m = folium.Map(location=[mean_lat, mean_lon], zoom_start=6)

    # Step 6: Plot each hospital with region color, admin type icon, and category in popup
    for _, row in hopital.iterrows():
        hospital_name = row["Etablissement hospitalier"]
        commune = row["Commune"]
        unique_key = f"{hospital_name}_{commune}"

        if unique_key in hospital_coords:
            coord = hospital_coords[unique_key]
            region = row["Région"]
            category = row["Catégorie"]
            signification = row["catégorie signification"]
            admin_type = row["Type Administratif"]

            # Assign color based on region
            color = region_colors.get(region, "gray")

            # Assign icon based on administrative type
            if admin_type == "Arrondissement":
                icon_name = "building"  # Urban
            elif admin_type == "Municipalité":
                icon_name = "home"  # Rural
            else:
                icon_name = "info-sign"  # Default

            # Add marker with detailed popup
            folium.Marker(
                location=coord,
                popup=f"{hospital_name}<br>Commune: {commune}<br>Région: {region}<br>Catégorie: {category} - {signification}<br>Type: {admin_type}",
                icon=folium.Icon(color=color, icon=icon_name, prefix="fa")
            ).add_to(m)

    # Step 7: Add a legend
    legend_html = '''
    <div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000; padding: 10px; background-color: white; border: 2px solid gray; border-radius: 5px;">
        <h4>Legend</h4>
        <h5>Regions</h5>
        <ul>
            {}
        </ul>
        <h5>Administrative Type</h5>
        <ul>
            <li><i class="fa fa-building" style="color: black;"></i> Arrondissement (Urban)</li>
            <li><i class="fa fa-home" style="color: black;"></i> Municipalité (Rural)</li>
        </ul>
    </div>
    '''.format(''.join([f'<li><span style="color: {color};">■</span> {region}</li>' for region, color in region_colors.items()]))

    # Add Font Awesome for icons
    m.get_root().html.add_child(folium.Element(
        '<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css">'
    ))

    # Add the legend to the map
    m.get_root().html.add_child(folium.Element(legend_html))

    # Step 8: Save the map
    m.save("hospital_map.html")
    print("Map saved as 'hospital_map.html'")
else:
    print("No hospitals were geocoded successfully.")
Couldn’t find coordinates for Sidi Moussa Lemhaya, Morocco
Using region fallback for Sidi Moussa Lemhaya: (33.38027961774928, -2.4998206318442966)
Map saved as 'hospital_map.html'
In [512]:
m
Out[512]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]: